X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C8CD7D.98E95278@onstor-exch02.onstor.net>; Fri, 13 Jun 2008 10:47:50 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: Proposed design for new(ish) boot procedure for Cougar
Date: Fri, 13 Jun 2008 10:47:49 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E0A6E8AB0@onstor-exch02.onstor.net>
In-Reply-To: <20080612182458.010d3d89@ripper.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Proposed design for new(ish) boot procedure for Cougar
Thread-Index: AcjM9Ex4rK0Ocl5NQmW0PhkksNsvlwAiEI5g
References: <20080612182458.010d3d89@ripper.onstor.net>
From: "Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>
To: "Andy Sharp" <andy.sharp@onstor.com>,
	"dl-Design Review" <dl-designreview@onstor.com>,
	"Brian Stark" <brian.stark@onstor.com>,
	"Warren Gale" <warren.gale@onstor.com>

For the independent fp/txrx reboot the main problem is reinitializing
the mgmtbus driver. The reboot code will have to ifconfig down the
mgmtbus, reset fp/txrx, and ifconfig the mgmtbus up after the fp/txrx
reboot. The management bus driver will have to execute the equivalent of
the current startup code when this happens.=20

>-----Original Message-----
>From: Andy Sharp
>Sent: Thursday, June 12, 2008 6:25 PM
>To: dl-Design Review; Brian Stark; Warren Gale
>Subject: Proposed design for new(ish) boot procedure for Cougar
>
>                       Cougar Boot Procedure Redesign
>                       ______________________________
>
>Problem
>=3D=3D=3D=3D=3D=3D=3D
>
>    Booting takes far too long on Cougar, and in theory the embedded
>    nodes should be rebootable w/o rebooting Linux on the Sibyte 1125.
>
>Reasons:
>    1)    Image load from CF is intolerably slow
>    2)    After image load, Linux boot takes the longest but is the
>          least likely to need rebooting, resulting in an unnecessary
>		  bottleneck.
>
>Solution
>=3D=3D=3D=3D=3D=3D=3D=3D
>
>    Redesign the boot flow to allow the embedded cores to be
>    independently booted if Linux is up.
>
>Proposal
>=3D=3D=3D=3D=3D=3D=3D=3D
>
>    Take a phased approach to implementing a redesigned boot procedure:
>
>	Phase I
>	-------
>	1)  Change SSC PROM to load and boot only Linux.
>	2)  Change FP/TXRX PROM to write a magic cookie in a
>	    predefined memory location indicating its readiness
>	    for it's image to be loaded.
>	3)  Impement an early start Linux daemon that waits for these
>	    boot magic cookies to be set by the embedded cores, loads
>	    their images to the correct memory locations, and signals
>	    to the FP/TXRX when finished.  The FP and TXRX could boot
>            while Linux completes its boot steps.
>
>	Phase 2
>	-------
>	1)  Through testing, determine what needs to be done to allow
>	    FP/TXRX to be rebooted independently without disturbing the
>	    Linux kernel and each other.  Current daemons that
>            communicate with FP/TXRX are not expected to be much
trouble
>            since they had to handle this for Cheetah, although this
has
>            not been extensively tested on Cheetah in the last few
>            releases.
>
>Expected Results
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
>Phase I
>-------
>
>Current boot time           Predicted Boot time        Predicted
savings
>-----------------           -------------------
-----------------
>2 minutes, 57 secs          1 minute, 43.7 secs        1 minute, 13.7
>secs
>
>42% reduction in boot time: current boot time* is 2:57, resulting boot
>time is estimated to be 1:43.7, or, a savings of 1:13.7, or, the new
>method would boot 1.7 times faster (2 times faster, or twice as fast,
>would be a 50% reduction in boot time).
>
>These estimations based on a difference in image load time for the
>FP/TXRX of 86 seconds for the PROM, and 12.7 seconds for Linux (cold
>cache).
>
>
>Phase II
>--------
>If just rebooting one or both of the FP/TXRX nodes, boot time estimated
>to be in the sub 10 second range.  This would substantially increase
>customer satisfaction and supportability, as well as resulting in a
>substantial increase in developer efficiency.
>
>
>
>
>
>* Boot time measured from when PROM code starts loading the first boot
>image to when nfxsh CLI is available.
